Carnival-combining speech technology and computer animation.

نویسندگان

M A Berger

G Hofer

H Shimodaira

چکیده

Speech is powerful information technology and the basis of human interaction. By emitting streams of buzzing, popping, and hissing noises from our mouths, we transmit thoughts, intentions, and knowledge of the world from one mind to another. We’re accustomed to thinking of speech as an acoustic, auditory phenomenon. However, speech is also visible. Although the primary function of speech is to manipulate air in the vocal tract to produce sound, this action has an ancillary effect of changing the face’s appearance. In particular, the action of the lips and jaw during speech causes constant deformation of the facial surface, generating a robust visual signal highly correlated with the acoustic one—that is, visual speech. In computer animation, this means that speech is something that must be studied, understood, and simulated. Speech animation, or lip synchronization, is a major challenge to animators, owing to its intrinsic complexity and viewers’ innate sensitivity to the face. (Actually, the term lip synchronization—lip sync—incorrectly implies that only the lips move, whereas actually almost all the facial surface below the eyes gets deformed during speech.) At the same time, demand for lip sync is sharply increasing, in terms of both realism and quantity. Automated solutions are now absolutely necessary. (For more on why this is the case, see the sidebar.) The past two decades have seen the emergence of techniques for animating speech automatically using speech technology—an interdisciplinary concept called visual speech synthesis. Since the late 1980s, two applications have been in development. Audio-driven animation automatically synthesizes facial animation from audio. Text-driven animation (or audiovisual text-to-speech synthesis) synthesizes both auditory and visual speech from text. The former is used for automatic lip sync with recorded audio, the latter for entirely text-based avatars. But speech technology and computer graphics remain worlds apart, and the development of visual speech synthesis suffers from lack of a unifi ed conceptual and technological framework. To meet this need, researchers at Speech Graphics (www. speech-graphics.com) and the University of Edinburgh’s Centre for Speech Technology Research (CSTR; www.cstr.ed.ac.uk) are developing Carnival, an object-oriented environment for integrating speech processing with real-time graphics. Carnival comprises an unlimited number of modules that can be dynamically loaded and assembled into a mutable animation production system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Teaching approaches to Computer Assisted Language Learning

Computers have been used for language teaching ever since the 1960's.Learning a second language is a challenging endeavor, and, for decades now, proponents of computer assisted language learning (CALL) have declared that help is on the horison. We investigate the suitability of deploying speech technology in computer based systems that can be used to teach foreign language skills. In this case,...

متن کامل

Text to Avatar in Multi-modal Human Computer Interface

In this paper, we present a new text-driven avatar system, which consists of three major components, a text-to-speech (TTS) unit, a speech driven facial animation (SDFA) unit and a text-to-sign language (TTSL) unit. A new visual prosody time control model and an integrated learning framework are proposed to realize synchronization among speech synthesis, face animation and gesture animation, wh...

متن کامل

Roadmaps, journeys and destinations speculations on the future of speech technology research

This article presents thoughts on the future of speech technology research, and a vision of the near future in which computer interaction is characterized by natural face-to-face conversations with lifelike characters that speak, emote and gesture. A first generation of these perceptive animated interfaces are now under development in a project called the Colorado Literacy Tutor, which uses per...

متن کامل

Generating Coordinated Natural Language and 3D Animations for Complex Spatial Explanations

Dynamically providing students with clear explanations of complex spatial concepts is critical for a broad range of knowledge-based educational and training systems. This calls for a realtime solution that can dynamically create 3D animated explanations that artfully integrate well-chosen speech. We present a visuo-linguistic framework for generating multimedia spatial explanations combining 3D...

متن کامل

Improving of Feature Selection in Speech Emotion Recognition Based-on Hybrid Evolutionary Algorithms

One of the important issues in speech emotion recognizing is selecting of appropriate feature sets in order to improve the detection rate and classification accuracy. In last studies researchers tried to select the appropriate features for classification by using the selecting and reducing the space of features methods, such as the Fisher and PCA. In this research, a hybrid evolutionary algorit...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IEEE computer graphics and applications

دوره 31 5 شماره

صفحات -

تاریخ انتشار 2011

Carnival-combining speech technology and computer animation.

نویسندگان

چکیده

منابع مشابه

Teaching approaches to Computer Assisted Language Learning

Text to Avatar in Multi-modal Human Computer Interface

Roadmaps, journeys and destinations speculations on the future of speech technology research

Generating Coordinated Natural Language and 3D Animations for Complex Spatial Explanations

Improving of Feature Selection in Speech Emotion Recognition Based-on Hybrid Evolutionary Algorithms

عنوان ژورنال:

اشتراک گذاری